There are 113,937 observations with 81 variables to analyze. Variables include basic loan information and applicant information, to detailed description of the loan backer and financial institution information. Most variables are numerica with String variables as factor variables.
There are many different case studies to analyze, from trends of lower income to upper income customers, to analyzing what products customers will seek to finance, to the credit worthiness of the borrower and / or the loan.
My goal is to consider the loan status and see which variables seem to impact and help identify trends among a borrower’s loan status. Specifically, I am interested to discover which factors help indicate and predict the riskiness of a loan and the likelihood of the loan going into default or charged off.
Below are some initial data investigations. Factors include income range, employment status, credit score of the borrower, purpose of the loan, loan amount, and many other factors as explored below and throughout this first phase of explording the data.
Make preliminary changes to data set
## $1-24,999
## 0.0638423
## 4
## 0.02102039
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229 25
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
##
## NC HR E D C B A AA
## 141 3508 3289 5153 5649 4389 3315 3509
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
## 36
## 0.7704082
## Employed
## 0.8387091
##
## False True
## 56459 57478
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 9.5 669.5 689.5 695.1 729.5 889.5 591
##
## NC HR E D C B A AA
## 141 3508 3289 5153 5649 4389 3315 3509
Loan Status: 83.2% of loans are in good standing, while 14.9% of loans are in bad standing and 1.8% of loans are late.
Income Distributions: Borrowers income brackets follow a fairly normal distirbution. Low and middle-income earners make up the bulk of the dataset, with $25,000-49,999 and $50,000-74,999 making up 28% and 27% respectively. Both upper income brackets of $75,000-99,999 and $100,000+ make up 15% each, leaving about 8% of earnings making below $25,000 or reporting no income.
Loan Analysis: By far the most common loan purpose is for debt consolidation, making up over 50% of all reported loans. Besides Other (or uncategorized) loans, home improvements, businesses, personal loans round are next largest, each taking about 3-7% of the loans. Loan rates follow a slightly normal / slightly right skewed distribution, with a median rate or 20.9%. Loan amounts of very right skewed with a median of $6,500 and mean of $8,337. The median monthly loan payment is $218 with a 36-month term.
Borrower’s Profile: Over 83% of borrowers had some kind of formal employment, while the remaining 17% was either unemployed, or the information was either ambiguous or unavailable. About half of borrowers were homeowners.
Credit Analysis: Borrowers have a median credit score of 689.5 and a very right skewed distribution of debt-to-income ratio with median of .22. Both the open credit lines and total credit inquiries are strongly right skewed also. The loan’s credit grade follows a somewhat normal distribution with C-grade.
I am interested in investigating any correlation to the loan status. Specifically, which factors impact the likelihood of a loan resulting in a bad status? How well have we anticipated potential bad loans? Which borrowers and which factors help indicate if the loan will get paid off? What makes a loan risky?
After analyzing several varaiables agains loan status, borrower’s interest rates, and credit scores, we are able to draw several simple conclusions.
Interest Rate: Interest rates and loan status have an inverse relationship. Loans with higher interest rates were more likely to be in bad standing as loans with lower interest rates. This is not too surprising, as riskier loans have higher interest rates and therefore cost more.
Loan Amount: Loan amounts and loan status tend to have a direct relationship. Most of the loans in bad standing were less than $5,000, where as loan amounts increased, it was more likely that they stayed current or were paid off. Loan amounts also have a direct relationship with income brackets. Again, not too surprising as higher income persons would presumably be less likely to borrow smaller amounts of money.
*Loan Categories: As previous mentioned, most loans were for the purpose of debt consolidation. After analyzing the 20 different loan categories and they cooresponding loan status, income brackets, interest rates, loan amounts, I determined that further multivariate analysis needs to be conducted.
Other Observations: - There seems to be a direct relationship with homeownership and loan status; although homeowners represent about half of the demographic, more homeowners had loans in better standing and fewer homeowners had loans in bad standing compared with non-homeowners.
APR Relationships: - Directly related to debt-to-income - Inversely related to income - Inversely related to credit score
Credit Score Relationships: - Inverse relationship to APR (as mentioned above) - Directly related to credit grade of loan - Ambiguous relationship to debt to income (requires further analysis) - Ambiguous relationship to employment (requires further analysis) - Directly related to income
Bivariate analysis was able to show a few relationships between the loan type, borrower’s profile, and the loan status. However, we will need further analysis to make better predictions and conclusions as to what factors are the true indicators of a loan’s status.
Now that we have analyzed multiple variables together, we are able to see some clearer patterns, relationships, and trends.
Loan Amount: In nearly every scenario analyzed, the loan was directly related to the loan status, irrespective of any other factor. However, there was one discover that disproved this. For loans categorized as debt consolidation, higher loan amounts - even across income brackets - were more likely to be in bad standing. This defied nearly every other trend across other loan categories.
APR: One of the clearest relationships we see is that as credit scores increase, the borrower’s APR decreases, while increasing the likelihood of the loan being in good standing.
Homeownership: While homeowners seem to be borrowing at similar rates and only slightly higher loan amounts than non-homeowners, non-homeowners were more likely to have a bad loan status compared to similarly to homeowners.
Employment: The safest borrowers were employed, while the most risky were full-time or retired. Full-time employees were equally risk across the loan amount and income bracket.
Income: As mentioned before, higher income earners borrowed more money at lower rates. They also borrowed across diverese categories, and were more likely to have their loan in good standing.
This visualization demonstates a clear relationship between a borrower’s credit score, their interest rate, and their loan status. In general, borrowers with higher credit scores borrowed money at lower rates and were more likely to be in good standing. The inverse is also true; loans with bad standing were likely to have lower credit scores and higher APR.
This plot shows a very clear relationship between a borrower’s stated income, the amount borrowed, and status of the loan. In general, higher income borrowers borrower more money and are more likely to be in good standing. Conversely, loans in bad standing are more likely to contain borrowers with a lower income and smaller loan amounts.
Given what we have seen, we know that in general, borrowers with higher credit scores and higher incomes borrow more money and lower rates, resulting in a strong direct relationship with good loan status. This visualization shows an exception to this pattern. For loan’s used for debt consolidation, a loan is just as like to be in a bad status across all income brackets and loan amounts. With this loan purpose accounting for over 50% of all loans represented in this data set, this is an important discover to consider.
With so much data to analyze, I had to do number bivarate data explorations and manipulations in order to find patterns and progress forward. Performing a few multiviarate visualizations made over patterns much easier to spot and identify.